Abstract: Modern digital systems powered by artificial intelligence (AI) have permeated various aspects of society, playing an instrumental role in many critical applications. This proliferation has accelerated recently, exemplified by the emergence of groundbreaking systems such as GPT-4 and MidJourney, giving rise to the concern that “no one, not even their creators, can understand, predict, or reliably control” these technologies. Indeed, the predictive performance is crucial to modern AI informed decision-making systems and is often measured by their accuracy on stylized benchmarks, however, it is crucial to ``know what is unknown": one must also assess the inherent uncertainty in the predictions so as to understand likely failure modes of decision-making and guarantee the transparency and consistency of AI systems.
In this talk, I will introduce our recent work on distribution-free uncertainty quantification to provide tight finite sample bounds for a rich class of statistical functionals of quantile functions and enable control of the dispersion of loss distribution, or the extent to which different members of a population experience unequal effects of algorithmic decisions. We provide multiple ways to provide tight and practically useful bounds in guiding modern AI deployment, including inverting test statistics and a novel numerical method. To demonstrate the power of our framework, it is applied to large language models like GPT-4 and we use the framework to guide prompt engineering in large language models by selecting prompts based on provable bounds on families of informative risk measures applied to problems like chatbot harmfulness and summarization of clinical patient notes.